Weighted Finite-State Morphological Analysis of Finnish Inflection and Compounding
نویسندگان
چکیده
Finnish has a very productive compounding and a rich inflectional system, which causes ambiguity in the morphological segmentation of compounds made with finite state transducer methods. In order to disambiguate the compound segmentations, we compare three different strategies, which we cast in a probabilistic framework. We present a method for implementing the probabilistic framework as part of the building process of lexc-style morpheme sub-lexicons creating weighted lexical transducers. To implement the structurally disambiguating morphological analyzer, we use the HFSTLEXC tool which is part of the open source Helsinki Finite-State Technology. This is the first time all three principles are cast in a probabilistic framework and compared on the same corpus using one tool. On our Finnish test corpus, the best method succeeds with 99,98 % precision and recall. 1
منابع مشابه
Weighted Finite-State Morphological Analysis of Finnish Compounding with HFST-LEXC
Finnish has a very productive compounding and a rich inflectional system, which causes ambiguity in the morphological segmentation of compounds made with finite state transducer methods. In order to disambiguate the compound segmentations, we compare three different strategies, which are all cast in the same probabilistic framework and compared for the first time. We present a method for implem...
متن کاملWeighting Finite-State Morphological Analyzers using HFST Tools
In a language with very productive compounding and a rich inflectional system, e.g. Finnish, new words are to a large extent formed by compounding. In order to disambiguate between the possible compound segmentations, a probabilistic strategy has been found effective by Lindén and Pirinen [7]. In this article, we present a method for implementing the probabilistic framework as a separate proces...
متن کاملEvaluation of Finite State Morphological Analyzers Based on Paradigm Extraction from Wiktionary
Wiktionary provides lexical information for an increasing number of languages, including morphological inflection tables. It is a good resource for automatically learning rule-based analysis of the inflectional morphology of a language. This paper performs an extensive evaluation of a method to extract generalized paradigms from morphological inflection tables, which can be converted to weighte...
متن کاملComplexity, two-level morphology and Finnish
'll~e twoolevel model provides a language independent framework for describing phonological mid morphological phenomena associated with word inflection, derivation and compounding. The model can be expressed ill tenos of finiteostate machines, and it is easy to impliement. ]he model has, in fact, two aspects: (1) it is a linguistic formalism for describing phonological phenomena, and (2) it is ...
متن کاملA Modular Approach to Turkish Noun Compounding: The Integration of a Finite-State Model
In this paper, we describe the design and integration of a three level cascaded non-deterministic finite state model of Turkish compounding into Turkish PAPPI, a comprehensive syntactic parser in the principles-andparameters(P&P) framework. Our approach is to handle compounding as an intermediate stage between morphological analysis and syntactic parsing. We discuss how the compounding machine ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2009